Multiprocessor design
for SoCs
For many applications, allocating performance among all of the tasks in a
system-on-chip (SoC) design is much easier, and provides greater design
flexibility, with multiple CPUs than with just one control processor and
multiple blocks of logic. Multiple-processor design changes the role of
processors, making it possible to design programmability into many functions
while keeping power budgets under control. The biggest advantage of using multiple processors as SoC task blocks is that
they're programmable, so changes can be made in software after the chip design
is finished. This means that complex state machines can be implemented in
firmware running on the processor, significantly reducing verification time. And
one SoC can often be used for multiple products, turning features on and off as
necessary. Multiple-processor design promotes much more efficient use of memory blocks.
A multiple-processor-based approach makes most of the memories
processor-visible, processor-controlled, processor-managed, processor-tested and
processor-initialized. Additionally, this reduces overall memory requirements
while promoting the flexible sharing and reuse of on-chip memories. But how do you pick the right embedded processors for multiple-CPU designs?
How do you partition your design to take maximum advantage of multiple
processors? How do you manage the software among all the processors? How do you
connect them and manage communications in the hardware? Four techniques At the conceptual level, the entire system can be treated as a constellation
of concurrent, interacting subsystems or tasks. Each task communicates with
other subsystems and shares common resources (memory, data structures, network
points). Developers start from a set of tasks for the system and exploit the
parallelism by applying a spectrum of techniques, including four basic
actions:
These methods interact with one another, so iterative refinement is often
essential, particularly as the design evolves. When a system's functions are partitioned into multiple interacting function
blocks, there are several possible organizational forms or structures,
including: Assigning tasks to processors The process of determining the right number of processors cannot be separated
from the process of determining the right processor type and configuration.
Traditionally, a real-time computation task is characterized with a "Mips
requirement"-how many millions of execution cycles per second are required. A control task needs substantially more cycles if it's running on a simple
DSP rather than a RISC processor. A numerical task usually needs more cycles
running on a RISC CPU than a DSP. However, most designs contain no more than two
types of processors, because mixing RISC processors and DSPs requires working
with multiple software development tools. Configurable processors can be modified to provide 10 to 50 times higher
performance than general-purpose RISC processors. This often allows configurable
processors to be used for tasks that previously were implemented in hardware
using Verilog or VHDL. Staying with a single configurable processor family
allows the same software development tools to be shared for all the
processors. Once the rough number and types of processors are known and tasks are
tentatively assigned to the processors, basic communications structure design
starts. The goal is to discover the least expensive communications structure
that satisfies the bandwidth and latency requirements of the tasks. When low cost and flexibility are most important, a shared-bus architecture,
in which all resources are connected to one bus, may be most appropriate. The
glaring liability of the shared bus is long and unpredictable latency,
particularly when a number of bus masters contend for access to different shared
resources. A parallel communications network provides high throughput with flexibility.
The most common example is a crossbar connection with a two-level hierarchy of
buses. Also, direct connections can be made when the communications among the
processors are well-understood and will not change. Intertask communications are built on two foundations: the software
communications mode and the corresponding hardware mechanism. The three basic
styles of software communications among tasks are message passing, shared memory
and device drivers. Message passing makes all communications among tasks overt. All data is
private to a task except when operands are sent by one task and received by
another. Message passing is generally easier to code than shared memory when the
tasks are largely independent but often harder to code efficiently with tightly
coupled tasks. With shared-memory communications, only one task reads from or writes to the
data buffer in memory at a time, requiring explicit access synchronization.
Embedded-software languages, such as C, typically include features that ease
shared-memory programming. The hardware-device-plus-software-device-driver model is most commonly used
with complex I/O interfaces, such as networks or storage devices. The device
driver mode combines elements of message passing and shared-memory access. Processors must interface with memories, I/O interfaces and RTL blocks. These
guidelines may help designers take better advantage of RAMs: Watch for contention latency in memory access. Increase memory width or
increase the number of memories that can be active to overcome contention
bottlenecks. Pay particular attention to tasks that must move data from off-chip
memory through the processor, and back to off-chip memory; these tasks can
quickly consume all available bandwidth. The move toward multiple-processor SoC designs is very real. Multiple
processors are used in consumer devices ranging from low-cost inkjet printers to
cell phones. As designers get comfortable with a processor-based approach,
processors have the potential to become the next major building block for SoC
designs, and SoC designers will turn to a processor-centric design methodology
that has the potential to solve the ever-increasing hardware/software
integration dilemma. Ashish Dixit (adixit@tensilica.com), vice president of
hardware engineering at Tensilica Inc. (Santa Clara, Calif.) Copyright 2005 © CMP Media LLC
By Ashish Dixit,
Courtesy of EE Times
A5= 19 2005 (10:00 AM)
URL:
http://www.embedded.com/showArticle.jhtml?articleID=170704502
Some complex issues arise when tasks
are mapped to an SoC implementation. Choosing to implement a specific task in a
processor, in logic or in software is a very important decision. There are two
guidelines for mapping tasks to processors: